Transitions

In this notebook investigation of tessellation nodes importance and transitions between tiles is briefly looked at.

By making use of visualisation and PageRank algorithm more important nodes are found..

Transition network

Transition plots

Visualize transitions of taxis from one grid tile to another.

Hope: Should provide some insight into taxi/traffic flow. 
How does traffic flow from one part of the city to other.


Steps to take:

1) Create trips
2) From trips extract edges (edge is between two consecutive measurements inside trip)

Sample at 5 min rate. Keep first measurement in each 5 min

Creating trips

In order not to have too large hops between sequences of coordinates for a taxi it is wise to partition data into trips. This is done by checking if difference between last measurement and current measurement is larger than some number.

In here a measurement belongs to another trip if the difference between measurements is larger than 15 min.

Example:

 Taxi_ID Time
 1 00:00:00
 1 00:10:00
 1 00:15:00
 1 00:50:00
 2 00:00:00
 2 15:00:00

 Would be turned into

 Taxi_ID Time Trip_id
 1 00:00:00 1
 1 00:10:00 1
 1 00:15:00 1
 1 00:50:00 2
 2 00:00:00 3
 2 15:00:00 4

Improved..

Seems like the efficient version gives the same result as inefficient one.

This is just a small additional win for me.. Down from 5 minutes to seconds.

Creating graph

Using the trip discretization we got from previous step and tessellation we can finally create some graphs!

First lets take all Monday's data and create a first graph

Results

We can see that even though our trips limit between 2 nodes is 15 minutes, there are still many edges which cross many nodes

Pagerank

For fun, lets try to find pagerank of our nodes.. Pagerank is calculated on directed graph, but plot is drawn using undirected as it is much faster here compared to directed graph drawing.

Interpretation

Pagrank is a measure of importance for different nodes. The larger the nodes are, the more important they are.

The importance of a node depends on its incoming nodes and how important the incoming nodes themselves are.

We can see that more important nodes are in the middle of the city. Also we note that large point at the top-right of the plot. This should be the "train-station" we saw in earlier analysis.

We get slightly differing results by playing around with the damping factor - alpha.

If alpha is 0.95, then it is 1 - 0.95 probability to jump into a random node. 
Otherwise the probability of choosing next node depends the outgoing edge weights.


We can see that some corners have rather high probability and weights. This might be due to self-edges. We get node transition to itself when the car is not moving. Lets see what happens if we remove all edges to the node itself:

We see a lot less probability assignet to corners. Also, we see that the "train-station" node has gotten smaller as well.

This plot pretty well shows the most popular locations of taxis. Also it shows locations of the tessellation which no taxi really visits.

Pagerank continued.. Different hours of the day

Lets partition day by 3 hours and plot above graphs again..

00:00 - 03:00
03:00 - 06:00
...

Normalize sizes of nodes according to the total number of logged gps signals of max partition.

Normalizing constant is logs_in_this_partition / max_logs_in_partition

Note: I'm not running the code as it is creating figures.. I'll instead display final .gif

Results

If nothing else, we can see more activity during the day from the liine widths. During nighttime central nodes lose a little bit of importance with the probability being spread more around all Beijing. In the morning we also see more spike in top-right, which should be the way to train station and airport.

Potential improvements

First of all, should normalize sizes of nodes according to activity. This can be done by the number of active taxis during that time period.

At the moment measurements can hope over grid tiles. In reality we do not teleport. There are two options to solve this:

1) Change data - add interpolations and up the sampling frequency
2) Add additional edges inside trips, which would contain the connecting tile(s).
       This can be done with interpolations.